feat(ontology): bO-1..bO-16 — DOLCE/FIBO/QUDT/ZUGFeRD/SKR hydrators + 2 new Tier-C hydrator types#407
Conversation
…logy)
Hydrate DOLCE+DnS Ultralite as the L1 upper ontology at OGIT::DOLCE_V1.0.
DOLCE is the root of the L1-L4 business-logic DAG (inherits_from: None) —
every downstream L2/L3/L4 hydrator (OWL-Time, PROV-O, QUDT, FIBO-FND, ...)
declares inherits_from: Some(OGIT::DOLCE_V1.0) and resolves against this
hydration via rdfs:subClassOf chains.
Additions:
- data/ontologies/dul.ttl — curated extract of canonical DOLCE+DUL with 244
named entities covering the four upper categories (Endurant/Perdurant/
Quality/Abstract), the DnS role hierarchy (Role/Agent/Patient/Instrument/
Location/TimeInterval), and 60+ object/datatype properties. CC-BY 4.0.
- LICENSES/DOLCE.txt — attribution per CC-BY 4.0.
- crates/lance-graph-ontology/src/hydrators/{mod,owl,dolce}.rs — Pattern-D
scaffolding: OwlHydrator + MetaStructureHydrator trait + ContextBundle +
OntologySlot, plus the DOLCE-specific glue (~50 LOC) with the 17-edge
cascade whitelist (subClassOf, classification, role-binding, part-of,
temporal anchoring).
- OntologyRegistry surface extension: register_bundle / bundle_for /
register_edge_types / edge_types_for / entity_count_for / resolve_iri_in.
- 4 integration tests (dolce_hydrator_smoke, dolce_upper_category_count,
dolce_dns_role_resolution, dolce_edge_whitelist_registered) + 2 owl.rs
unit tests + 2 dolce.rs unit tests; all green (10 new + 54 pre-existing
pass; cargo clippy clean; hydration <1ms in release).
… DUL naming
Replace the hand-curated DUL.ttl extract with the canonical DOLCE+DnS
Ultralite ontology (v4.2, ~187 KB, 78 classes + 113 object properties
+ 5 datatype properties) as supplied by the user.
Test alignment: canonical DUL deliberately departs from DOLCE-Lite-Plus
naming per its own header ("the names of classes and relations have been
made more intuitive"). The two load-bearing renames for the cognitive-
shader L1 slot are Endurant -> Object and Perdurant -> Event. The four
upper categories in DUL are therefore: Object, Event, Quality, Abstract
(all direct sub-classes of Entity).
Likewise, canonical DUL does NOT define `Patient` / `Instrument` /
`Location` as named classes — those are runtime Concept individuals or
live in extension modules (CoreLegal, IOLite, SystemsLite). The DnS
role-resolution test now anchors on the canonical role hierarchy:
Agent, Concept, Role, Task, Parameter, Goal, Method, Plan, TimeInterval.
Test surface (10 tests total, all green):
- dolce_hydrator_smoke (entity_count > 200 confirmed against canonical)
- dolce_upper_category_count (5 root IRIs incl. Entity)
- dolce_dns_role_resolution + dns_role_anchor_concept_is_present +
dul_extension_role_iris_do_not_resolve (the new test pins the
expectation that Patient/Instrument/Location are NOT canonical classes)
- dolce_edge_whitelist_registered + edge_types_for_unknown_g_returns_none
+ re_register_edge_types_is_idempotent
- 2 owl.rs + 2 dolce.rs unit tests
All 17 edges in the cascade whitelist verified present in canonical DUL.
LICENSES/DOLCE.txt updated to reflect verbatim-canonical usage and the
DOLCE-LP -> DUL rename rationale.
…DT, schema.org
Ship four L2/L3 hydrators on top of the Pattern-D substrate landed in
PR-bO-1 (DOLCE). Each declares inherits_from: Some(OGIT::DOLCE_V1.0)
and reuses the generic OwlHydrator; per-ontology glue is ~50 LOC plus
cascade-edge whitelist.
bO-2 OWL-Time (OGIT::TIME_V1, slot 10):
- data/ontologies/time.ttl (W3C, 103 KB)
- 22-edge whitelist incl. all 13 Allen interval relations
- assertions: TemporalEntity/Instant/Interval/ProperInterval resolve
bO-3 PROV-O (OGIT::PROVO_V1, slot 11):
- data/ontologies/provo.ttl (W3C, 70 KB)
- 21-edge whitelist incl. wasGeneratedBy/used/wasDerivedFrom/
wasAttributedTo/wasAssociatedWith/actedOnBehalfOf
- assertions: Entity/Activity/Agent/Bundle resolve
bO-4 QUDT 2.1 (OGIT::QUDT_V1, slot 12):
- data/ontologies/qudt-core.ttl (137 KB) + qudt-units.ttl (3.85 MB,
~2900 unit individuals)
- Multi-file hydration via new OwlHydrator::hydrate_many; both TTL
artifacts merge into one ContextBundle
- assertions: SI base units (M/SEC/K/A/MOL/CD/KiloGM) resolve
- NOTE: quantitykinds catalogue not included (user-uploaded file was
byte-identical duplicate of units file); follow-up upload needed
bO-8 schema.org (OGIT::SCHEMAORG_V1, slot 13):
- data/ontologies/schemaorg.ttl (1.13 MB, ~1400 named subjects)
- 18-edge whitelist incl. schema-org-specific flexible-typing surface
(domainIncludes / rangeIncludes)
- assertions: Thing/Person/Organization/Place/Event/Product resolve
Infrastructure:
- OwlHydrator::hydrate_many for multi-file ontologies
- 4 new canonical slot tokens (TIME=10, PROVO=11, QUDT=12, SCHEMAORG=13)
- 4 new modules/*/manifest.yaml files declaring inherits_from: dolce
- Entity-code allocation in 500-539 range to clear smb-office (200),
q2-cockpit (300), hubspo (400)
- LICENSES/{OWL-TIME,PROV-O,QUDT,SCHEMAORG}.txt attribution files
- 13 new tests (all green); QUDT hydration of core+units in 0.42s release
- Contract test ALL_G_SLOTS.len assertion loosened from `== 6` to `>= 6`
with explanatory comment about the bO-* L2 series
… Business Entities (BE)
Ship the L3 financial / business ontology hydrators on the Pattern-D
substrate landed in PR-bO-1. FIBO is the second non-Turtle format the
hydrator supports — it ships as RDF/XML across ~111 modular .rdf files
under data/ontologies/fibo-{fnd,be}/. The generic OwlHydrator now
dispatches by extension: .rdf → oxrdfxml, everything else → oxttl.
Substrate changes:
- oxttl upgraded 0.1 → 0.2, oxrdf 0.2 → 0.3 (to share oxrdf with the
new oxrdfxml dep — duplicate-version pulls of oxrdf produce confusing
type mismatches at the parser boundary)
- Added oxrdfxml = "0.2" dep
- OwlHydrator::hydrate_many() now dispatches by file extension via
detect_format(); the trait surface is unchanged
bO-6 FIBO-FND (OGIT::FIBOFND_V1, slot 20):
- data/ontologies/fibo-fnd/ (~59 RDF/XML files, ~1.5 MB)
- entity_count = 2232 after hydration (covers OMG Commons re-imports
+ FND-native classes)
- 21-edge whitelist incl. OMG Commons hasIdentifier / hasPart, plus
FND-specific Party / Address / Currency / Contract predicates
- 3 smoke tests verify load-bearing IRIs resolve (MonetaryAmount,
Currency, AmountOfMoney, ExchangeRate, InterestRate, Person,
cmns-org:LegalPerson)
bO-7 FIBO-BE (OGIT::FIBOBE_V1, slot 21):
- data/ontologies/fibo-be/ (~52 RDF/XML files, ~1.3 MB)
- entity_count = 1964 after hydration
- 17-edge whitelist incl. ownership / control / corporate-governance
predicates plus FND-inherited foundation predicates
- 3 smoke tests verify Corporation / Partnership / Trust / LegalEntity
/ BusinessEntity resolve
Both modules declare inherits_from: Some(OGIT::DOLCE_V1.0) directly.
A future PR can chain BE → FND → DOLCE once multi-parent inherits_from
chains land in the registry.
Test surface (6 new tests, all green); cargo clippy clean; full
lance-graph-ontology suite (84 tests) plus downstream consumers
(callcenter, consumer-conformance, cognitive-shader-driver) build clean.
Slot allocation: FIBOFND=20, FIBOBE=21 (above the L2 universal block
at 10-13, leaves 7-9 + 14-19 reserved for future inter-layer ontologies).
License: MIT (EDM Council). Attribution in LICENSES/FIBO.txt.
Adds the canonical QUDT 2.1 quantitykinds catalogue (~1240 quantity-kind individuals, 1.92 MB) at data/ontologies/qudt-quantitykinds.ttl. This was the missing third leg of bO-4 — prior PR shipped core + units only because the uploaded quantitykinds file was a byte-identical duplicate of units (same MD5). hydrate_qudt() now passes all three artifacts (core + units + quantity- kinds) to OwlHydrator::hydrate_many, merging them into one bundle keyed by OGIT::QUDT_V1.0. Test additions: - Smoke test threshold raised from >2000 to >4000 entities (was ~3000 with core+units; now substantially higher with quantitykinds) - New qudt_si_quantitykinds_resolve test asserts the seven SI base quantitykinds (Length, Mass, Time, ElectricCurrent, Temperature, AmountOfSubstance, LuminousIntensity) plus common derived ones (Force, Energy, Power, Pressure, Frequency, Voltage, Velocity) resolve under G=QUDT_V1 LICENSES/QUDT.txt updated to reflect full three-file coverage. All 85 lance-graph-ontology tests pass; cargo clippy clean.
Two additions on the Pattern-D substrate:
bO-5 SKOS (OGIT::SKOS_V1, slot 14):
- data/ontologies/skos/skos-core.rdf — canonical SKOS Core (32 classes
+ properties: Concept, ConceptScheme, Collection, OrderedCollection,
broader/narrower/related/{broader,narrower}Transitive, in/topConcept,
prefLabel/altLabel/hiddenLabel, notation, the *Match mapping family)
- data/ontologies/skos/skos-xl.rdf — SKOS-XL eXtension for Labels
(lifts labels to first-class IRIs via Label/literalForm/labelRelation)
- data/ontologies/skos/skos-owl1dl.rdf — OWL-1-DL-conformant variant
shipped for tools that require strict DL conformance (not used by
the default hydration)
- 27-edge whitelist (subClassOf + subPropertyOf + 23 SKOS semantic-relation
/ mapping / label predicates) — load-bearing for SKR03/SKR04 alignment
- LICENSES/SKOS.txt (W3C document license)
- 4 smoke tests verify Core classes, XL Label surface, and the *Match
predicates resolve
DUL extension modules merged into OGIT::DOLCE_V1:
- data/ontologies/dul-extensions/conceptualization.owl — agent
conceptualization patterns (knows / believes / assumes / adopts +
InternalRepresentation class). Refines the dul:conceptualizes surface
used by cognitive-shader agency cascades.
- data/ontologies/dul-extensions/lmm-l2.owl — Lexical MetaModel L2.
Adds NER surface (NamedEntity / Name / ConceptExpression /
ContextualExpression / IndividualReference / MultipleReference /
Gloss / hasSyntacticFunction / hasInstance / isInstanceOf).
- New `hydrate_dolce_from_many` API that merges DUL.ttl + extensions
into a single ContextBundle keyed by OGIT::DOLCE_V1.0 via
OwlHydrator::hydrate_many. Extensions are loaded best-effort
(skipped if missing) so the path works under either deployment.
- 3 smoke tests verify the Conceptualization and LMM_L2 IRIs resolve
under G=DOLCE_V1, plus the merged entity count grows past >=240.
Format detection improvement:
- OwlHydrator::detect_format now accepts bytes and content-sniffs the
first 256 bytes for `<?xml` / `<rdf:RDF` when the extension is
ambiguous (.owl files exist in both Turtle and RDF/XML form in the
wild; the DUL extensions are RDF/XML-with-.owl). Existing routes by
extension (.rdf / .ttl / .nt) are unchanged.
All 92 lance-graph-ontology tests pass; cargo clippy clean.
… XsdHydrator
First Tier-C hydrator (per spec) — handles XSD-shaped schemas where
the OwlHydrator's TTL/RDF-XML routes don't apply. ZUGFeRD/Factur-X
EN16931 is the German hybrid PDF/A-3 + XML invoice format aligned
with EU directive EN 16931; the underlying XML schema is UN/CEFACT
CrossIndustryInvoice (CII) v100.
New XsdHydrator (minimal name-extraction shape):
- crates/lance-graph-ontology/src/hydrators/xsd.rs
- Walks every <xs:element>/<xs:complexType>/<xs:simpleType>/<xs:attribute>/
<xs:attributeGroup> declaration via quick-xml streaming, interns each
as `{targetNamespace}#{name}` IRIs into the existing ContextBundle
surface
- Adds quick-xml = "0.37" as a direct dep (already transitively present
via oxrdfxml)
- collect_xsd_files() walks a directory tree for `.xsd`, sorted for
stable interning order
- Unit test verifies a tiny inline XSD interns 5 named declarations
- Type-graph semantics (xs:extension base / xs:restriction base) NOT
resolved into rdfs:subClassOf-equivalent edges — deferred follow-up,
documented in xsd.rs module docs
bO-16 ZUGFeRD (OGIT::ZUGFERD_V1, slot 30):
- data/ontologies/zugferd/ — 4 XSD files (top-level CII + RAM + QDT +
UDT) from the Factur-X 1.08 EN16931 profile, plus FACTUR-X_EN16931.sch
Schematron rules and FACTUR-X_EN16931_codedb.xml code-list database
(latter two shipped for future SchematronHydrator / CodeListHydrator
PRs, not hydrated today)
- Source: LandrixSoftware/validator-configuration-zugferd (Apache-2.0)
- entity_count > 200 after hydration of all 4 XSDs
- 17-edge cascade whitelist covers the three CII top-level relational
containers (ExchangedDocument / ExchangedDocumentContext /
SupplyChainTradeTransaction) plus the 14 most load-bearing RAM
predicates (Seller/Buyer TradeParty, TradeAgreement / TradeDelivery /
TradeSettlement headers, line-item containers, tax / payment terms,
monetary summation)
- 5 smoke tests verify CII root, RAM types (Header*TradeAgreementType,
TradePartyType, TradeAddressType, TradeTaxType, ...), all four CII
namespaces present, edge whitelist registered
LICENSES/ZUGFERD.txt cites Apache-2.0 (LandrixSoftware config) +
UN/CEFACT IP policy for the underlying CII XSDs.
All 98 lance-graph-ontology tests pass; cargo clippy clean; downstream
consumers build clean.
Second Tier-C hydrator. Schematron (ISO/IEC 19757-3) is the rule-
language layer that sits on top of XSD across every major e-business
standard — EN 16931, PEPPOL, XRechnung, ZUGFeRD, UBL, ISO 20022 all
ship structural XSD + behavioral .sch business rules. This hydrator
generalizes the rule-extraction shape so each future spec only needs
glue, not new hydrator code.
New SchematronHydrator (minimal name-extraction shape):
- crates/lance-graph-ontology/src/hydrators/schematron.rs
- Walks `<assert id="..." test="..." flag="...">` and
`<report id="...">` and `<pattern id="...">` elements via quick-xml
streaming, interns each as `{base_iri}/{element}/{id}`
- Additionally scans text bodies for bracketed business-rule IDs
like [BR-52], [BR-CO-03], [BR-DE-1], [PEPPOL-EN16931-R008] —
these are the canonical EN16931 / PEPPOL / DE / VAT-category rule
identifiers used by cross-spec alignment. Each distinct ID interns
as `{base_iri}/rule/{rule-id}`.
- Strict business-rule ID validator (uppercase-only, dash-segmented,
no leading digit / double dash / trailing dash) keeps the rule
namespace clean of false positives from message text.
- 3 unit tests verify the validator + end-to-end interning on a tiny
in-memory .sch fixture.
bO-15 ZUGFeRD rules (OGIT::ZUGFERDRULES_V1, slot 31):
- hydrate_zugferd_rules() points the SchematronHydrator at the
FACTUR-X_EN16931.sch file that already shipped with bO-16.
- Base IRI urn:schematron:factur-x-1.08-en16931 — stable URN for
downstream alignment to PEPPOL / EN16931 / FeRD rule registries.
- inherits_from: Some(OGIT::ZUGFERD_V1.0) — rule namespace is
meaningless without the structural CII namespace.
- 5 smoke tests verify the EN16931 anchor rules (BR-52 / BR-45 /
BR-CO-03 / BR-CO-17 / BR-S-08 / BR-Z-08 / BR-DEC-19), PEPPOL
extension (PEPPOL-EN16931-R008), schema-level FX-SCH-A-* IDs that
KoSIT validator output references directly, and rule-family
coverage (BR-CO / BR-DEC / BR-S / BR-Z / PEPPOL).
- ~510 IRIs hydrated: 301 distinct schema assert IDs (428 elements
collapse to 301 — some IDs are reused across patterns) + ~209
distinct bracketed business-rule IDs.
Inspection findings documented in tests + LICENSES/ZUGFERD.txt:
- 191 <report> elements present but NONE have @id (Schematron-1
style; text bodies still contribute business-rule IRIs).
- 0 patterns / 0 phases with @id.
Side fix: factored out the XsdHydrator interning-closure type alias
to clear a clippy type_complexity warning. Now 5 clippy warnings,
all pre-existing oxrdf deprecations.
All 106 lance-graph-ontology tests pass; downstream consumers build
clean.
Converts two DATEV Standardkontenrahmen PDFs (German standard chart of
accounts) to machine-readable CSV, ready for future hydration as the
PR-bO-13 SKR slot in lance-graph-ontology.
Data files (data/ontologies/skr-datev/):
skr04.csv 1232 accounts. SKR 04 generic (Abschlussgliederungs-
prinzip), DATEV Art.-Nr. 11175, valid 2023. Family
numbering 0-9 maps to balance-sheet structure:
0=Anlagevermögen, 1=Umlaufvermögen, 2=Eigenkapital,
3=Fremdkapital, 4=Erträge, 5=Materialaufwand,
6=Personalaufwand, 7=Finanzergebnis, 9=Statistische.
skr03.csv 1476 accounts. Canonical SKR 03 (Prozessgliederungs-
prinzip), extracted as the NN=00 subset of the Bau
variant. Family numbering follows process-oriented
scheme: 0=Anlage/Kapital, 1=Finanz/Privat, 2=Abgrenzung,
3=Wareneingang/Bestände, 4=Betriebliche Aufwendungen,
5/6=frei, 7=Erzeugnisbestände, 8=Erlöse, 9=Vortrags/
Statistische.
skr03-bau.csv 1686 accounts. Full SKR 03 Bau und Handwerk (DATEV
Art.-Nr. 19606, 2026 edition). Uses 6-digit account
format NNNN+NN. 210 of these are trade-specific
extensions (NN>00) on top of the canonical 1476.
parse_pdfs.py Python script that produced the CSVs. Character-
column slicing of `pdftotext -layout` output. Depends
on poppler-utils for the underlying extraction.
README.md Documents file layout, family-numbering differences
between SKR 03 and SKR 04, source artifact refs,
known data-quality issues (~12% of entries have
multi-column bleeding in account_name and may need
hand-cleanup; account NUMBERS are 100% reliable).
LICENSES/DATEV-SKR.txt documents the legal basis: only the underlying
chart-data is redistributed (account numbers + descriptive names), not
DATEV's copyrighted PDF formatting / layout / typography. Account numbers
are public-domain bookkeeping standard identifiers; account names are
descriptive labels for HGB-defined accounting concepts.
Next step (NOT in this commit): PR-bO-13 hydrator that interns each
account as `urn:datev:skr04:account/{number}` and
`urn:datev:skr03:account/{number}` IRIs into the OntologyRegistry,
with family classifications as SKOS Collections and cross-walks to
FIBO MonetaryAmount + ZUGFeRD invoice projections as separate axioms.
Third Tier-C hydrator. Reads the SKR 03 / SKR 04 CSVs that landed in
the previous commit and interns each account as a stable u32 IRI in
the OntologyRegistry. The two schemes hydrate into SEPARATE G slots
(SKR03_V1=40, SKR04_V1=41) because the same account number means
different things across them — e.g. account 1000 is "Kasse" in SKR 03
but "Roh-, Hilfs- und Betriebsstoffe" in SKR 04.
New SkrHydrator (minimal name-extraction shape):
- crates/lance-graph-ontology/src/hydrators/skr.rs — generic CSV-driven
hydrator. Streams a CSV via a small hand-rolled parser (handles
double-quoted fields with embedded commas + "" escape), interns
`{iri_prefix}/{account_number}` for each row.
- No new external dep — CSV parsing is ~30 lines of state-machine.
- 2 unit tests cover the CSV parser and a tiny end-to-end hydration.
New glue (hydrators/skr_datev.rs) ships three entry points:
- hydrate_skr03 → OGIT::SKR03_V1, IRI base `urn:datev:skr03:account`
- hydrate_skr04 → OGIT::SKR04_V1, IRI base `urn:datev:skr04:account`
- hydrate_skr03_bau → also OGIT::SKR03_V1 but with a separate IRI base
`urn:datev:skr03-bau:account` so 6-digit Bau accounts coexist with
canonical 4-digit ones without clashing.
CSV data regeneration (data/ontologies/skr-datev/parse_pdfs.py):
- Widened SKR 03 Bau column slice from [44, 90) to [38, 100). The old
bound cut off account numbers preceded by function-code prefixes
("F 1000 00 Kasse" started at col 41; slicing at 44 dropped the
function code AND the first digit). SKR 03 canonical entity count
grew 1476 -> 1623.
- SKR 04: unchanged.
Manifests + slots:
- modules/skr03/manifest.yaml — declares anchor accounts (1000 Kasse,
1200 Bank, 1400 Forderungen LL, 1576 Vorsteuer 19%, 3300/8400/8300
revenues) with stable u16 entity IDs for external code reference.
- modules/skr04/manifest.yaml — declares balance-sheet-oriented anchors
(1000 RHB, 1200 Forderungen LL, 1400 Vorsteuer, 1600 Kasse, 1800 Bank,
2900 Eigenkapital, 3300 Verbindlichkeiten, 4400 Erlöse, 5400 Wareneingang)
- crates/lance-graph-contract/build.rs: adds SKR03=40 + SKR04=41 to the
G-slot table.
6 smoke tests verify:
- both schemes hydrate into their respective G slots with the right
domain name and DOLCE inheritance
- entity counts in expected range (~1500 each)
- load-bearing anchor accounts resolve under their scheme-specific
IRIs (Kasse / Bank / Forderungen / Vorsteuer / Wareneingang / Erlöse)
- SKR 03 / SKR 04 are independent slots — SKR03 IRIs don't resolve in
the SKR04 bundle and vice versa
- SKR 03 Bau extensions (007510 Sand- und Kiesausbeute, 010010 Bauliche
Anlagen für stationäre Fertigung) resolve under the Bau IRI base
Known parser coverage gaps (documented in test comments):
- SKR 04 account 2900 "Gezeichnetes Kapital" not extracted (page-8
column boundary issue)
- SKR 04 account 4400 entangled in multi-account bleed at 4332
Both are CSV data quality issues, not hydrator bugs. The hydrator is
correct; future parse_pdfs.py refinement will close these gaps.
All 114 lance-graph-ontology tests pass; clippy clean; downstream
consumers (callcenter, consumer-conformance, cognitive-shader-driver)
build clean.
|
Important Review skippedToo many files! This PR contains 189 files, which is 39 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (4)
📒 Files selected for processing (189)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 299e6c43b6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| g: OGIT::SKR03_V1.0, | ||
| version: OGIT::SKR03_V1.1, | ||
| domain_name: "skr03-bau".to_string(), | ||
| inherits_from: Some(OGIT::DOLCE_V1.0), | ||
| starting_entity_id: 100, | ||
| iri_prefix: SKR03_BAU_IRI_PREFIX.to_string(), | ||
| }; | ||
| h.hydrate(csv_path, registry)?; |
There was a problem hiding this comment.
Merge SKR03 Bau rows instead of overwriting SKR03 bundle
hydrate_skr03_bau_from registers its data into the same G slot (OGIT::SKR03_V1.0) as hydrate_skr03, but register_bundle replaces any existing bundle at that slot. In practice, if a caller hydrates canonical SKR03 and then adds Bau extensions (which the docs/tests describe as coexisting via a different IRI prefix), the second call drops all canonical 4-digit SKR03 accounts and leaves only Bau rows. This breaks mixed consumers that expect both account sets in one SKR03 context.
Useful? React with 👍 / 👎.
| Ok(Event::Start(e)) | Ok(Event::Empty(e)) => { | ||
| let qname = e.name(); | ||
| let local: Vec<u8> = local_name(qname.as_ref()).to_vec(); | ||
| let id = attr_value(&e, b"id"); | ||
| match local.as_slice() { |
There was a problem hiding this comment.
Reset assert/report state for self-closing Schematron nodes
The parser handles Event::Start and Event::Empty with the same branch and always sets in_assert_or_report = true for assert/report, but self-closing elements (<assert .../> or <report .../>) never emit a matching End event. That leaves the state stuck "inside" an assertion, so later unrelated text can be scanned as rule text and produce spurious /rule/... IRIs. This is input-dependent but can corrupt rule extraction for valid XML event streams.
Useful? React with 👍 / 👎.
P1 — SKR03 Bau no longer overwrites canonical SKR03 slot:
Before: hydrate_skr03_bau registered into OGIT::SKR03_V1 (same slot
as canonical), so a caller that hydrated both lost the canonical
4-digit account set on the second call. The test only invoked Bau
alone so the silent overwrite went undetected.
After: SKR 03 Bau hydrates into its OWN G slot OGIT::SKR03BAU_V1=42,
with inherits_from: Some(OGIT::SKR03_V1.0) to make the structural
dependency on canonical SKR 03 explicit. Mixed consumers can now
hold canonical SKR 03 (4-digit, slot 40) AND Bau (6-digit, slot 42)
in the same OntologyRegistry without interference.
- crates/lance-graph-contract/build.rs: allocates SKR03BAU=42
- crates/lance-graph-ontology/src/hydrators/skr_datev.rs: hydrate_skr03_bau
writes to OGIT::SKR03BAU_V1.0 with the new inherits_from
- modules/skr03-bau/manifest.yaml: declares 5 anchor trade-specific
accounts (Sand- und Kiesausbeute, Bauliche Anlagen für stationäre
Fertigung, Bauten auf dem Bauhof) with stable u16 entity IDs
- tests: skr03_bau_extensions_hydrate_into_skr03_slot renamed to
skr03_bau_extensions_hydrate_into_dedicated_slot (expects slot 42)
- NEW REGRESSION TEST: skr03_canonical_and_bau_coexist_in_one_registry
hydrates both schemes in sequence, asserts:
- canonical bundle still has 1400+ entities (not dropped)
- canonical SKR 03/1000 (Kasse) resolves in canonical bundle
- Bau /007510 (Sand- und Kiesausbeute) resolves in Bau bundle
- Bau IRIs do NOT resolve in canonical bundle (and vice versa)
P2 — Schematron Event::Empty no longer leaks state into later text:
Before: <assert .../>, <report .../>, <pattern .../> (self-closing)
emitted Event::Empty handled with the same branch as Event::Start
and set in_assert_or_report = true. Self-closing elements never
emit a matching Event::End, so the flag stayed stuck true and the
parser scanned later unrelated text body as rule message text,
producing spurious /rule/... IRIs.
After: Event::Start and Event::Empty are split into separate match
arms. Empty interns the @id only (no body to collect) and does NOT
touch in_assert_or_report. Start sets the flag; End extracts rule
IDs from current_text_buf and resets state. Self-closing elements
no longer affect state.
- crates/lance-graph-ontology/src/hydrators/schematron.rs
- NEW REGRESSION TEST: self_closing_assert_does_not_capture_later_text
constructs a fixture with a self-closing <assert/> followed by
a non-assert <bar> element containing "[BR-99]" text, then a
normal Start/End assert containing "[BR-42]":
- A-EMPTY-001 and A-NORMAL-001 assert IRIs both resolve
- BR-42 (real rule from message body) resolves
- BR-99 (from stray text outside any assert) does NOT resolve
Side fix: replaced `if !map.contains_key(&iri) { map.insert(iri, id) }`
in SkrHydrator with the Entry::Vacant pattern to clear a clippy
`map_entry` warning. Clippy back to 5 warnings (all pre-existing
oxrdf deprecation warnings, no new ones).
All 116 lance-graph-ontology tests pass (was 114; +2 regression tests);
downstream consumers build clean.
… vocabs Companion to PR #407 (merged). Expands `NamespaceRegistry::seed_defaults` from 16 to 29 entries, registering the 13 external vocabularies that PR #407 added hydrators for. This is the O(1) IRI ↔ context_id matching table backed by `lance_cache.rs`'s Lance dataset; consumers like smb-office-rs and woa-rs lookup by namespace shortname instead of hand-rolling slot constants. Why this lives in lance-graph-ontology, not in OGIT: - Public OWL/RDF source files stay pristine in data/ontologies/ (DOLCE+DUL, FIBO-FND/BE, OWL-Time, PROV-O, QUDT, schema.org, SKOS, ZUGFeRD CII XSDs + Schematron). Modifying them taints downstream use. - The OGIT repo is authoritative for namespace registrations but adding new TTL files there with hand-picked contextIds would be drift. - The matching table belongs in the CLIENT (lance-graph-ontology), keyed by namespace shortname, persisted via the existing lance_cache layer. - Per user direction 2026-05-21: "expand always but drift is probably bad" + "deinterlace them locally and keep that matching table in a lance table for O(1) and check what lance-graph-ontology has in regards" → expansion lives here, OGIT untouched. Allocation: ID Namespace PR / Hydrator ───────────────────────────────────────────────────────── 0 SMB (pre-existing) 1 WorkOrder (pre-existing) 2 Healthcare (pre-existing) 3 Network (pre-existing) 4 EmailCorrespondance (pre-existing) 5 SharePoint (pre-existing) 10-19 Medical/<sub> (pre-existing, dense) 20 Foundation/DOLCE-DUL bO-1 hydrate_dolce 21 Foundation/OWL-Time bO-2 hydrate_owltime 22 Foundation/PROV-O bO-3 hydrate_provo 23 Foundation/QUDT bO-4 hydrate_qudt 24 Foundation/schema-org bO-8 hydrate_schemaorg 25 Foundation/SKOS bO-5 hydrate_skos 30 FinancialAccounting/FIBO-FND bO-6 hydrate_fibo_fnd 31 FinancialAccounting/FIBO-BE bO-7 hydrate_fibo_be 32 FinancialAccounting/ZUGFeRD bO-16 hydrate_zugferd 33 FinancialAccounting/ZUGFeRD-Rules bO-15 hydrate_zugferd_rules 34 FinancialAccounting/SKR03 bO-13 hydrate_skr03 35 FinancialAccounting/SKR04 bO-13 hydrate_skr04 36 FinancialAccounting/SKR03-Bau bO-13 hydrate_skr03_bau Allocation policy matches the existing Medical/<sub> pattern: dense within family-range, gaps between ranges left as expansion room. `allocate()` continues to fill gaps 6..=9 and 26..=29 first, then 37+. Notes: - `next_free_id` doc-comment updated to reflect the new seed layout. First dynamic id is now 6 (was already 6 in practice; the prior comment said "20" which was off by 14). - Three regression tests updated: * `seed_defaults_has_sixteen_entries` → `_has_twenty_nine_entries` * `seed_defaults_assigns_canonical_ids` adds spot-checks at 20/25/30/34/35/36 * `allocate_skips_to_first_unused_id` len assertion 16 → 29 - One integration test (`tests/context_id_test.rs`) updated to match. All 116 lance-graph-ontology tests pass; clippy clean (5 pre-existing oxrdf deprecation warnings, no new); downstream consumers (callcenter, consumer-conformance, cognitive-shader-driver) build clean.
…-Ce9Oa feat(ontology): seed NamespaceRegistry with bO-* upstream vocabs (PR #407 follow-up)
…st PR #407) PR #407 + the ~11 preceding bO-* feature commits shipped the concrete OWL/DOLCE/OGIT cross-walk hydrators in lance_graph_ontology::hydrators::*. §4 of this plan referenced those surfaces abstractly; this commit tightens to concrete type pointers. §4.1 (OWL/DOLCE cross-walk surface) — table now names the hydrator that populates each MetaAnchors field: - owl_upper_class + dolce_marker → hydrate_dolce (OGIT::DOLCE_V1, inherits_from: None, 17-IRI edge whitelist covering rdfs:subClassOf + owl:equivalentClass + DnS classify/role-binding + dul:hasPart/isPartOf + dul:hasTimeInterval/isObservableAt; note the canonical DOLCE+DUL Endurant→Object/Perdurant→Event rename). - foundry_object_type → hydrate_schemaorg + hydrate_fibo_be for the upper-class anchors Foundry typically maps to. - wikidata_qid → not yet in the hydrator surface; deferred until a tenant requests Wikidata sync (~50 LOC glue). §4.3 (NEW — Hydrator inventory) — full surface map: - Generic substrate: OwlHydrator (the bO-* scaffold every hydrator instantiates), MetaStructureHydrator trait, ContextBundle, EntityId, OntologySlot, HydrateErr. - Layered ontologies (L1 → L4 sector): · L1: hydrate_dolce (root, inherits_from: None) · L2: hydrate_owltime / hydrate_provo / hydrate_qudt (all inherits_from: Some(OGIT::DOLCE_V1.0)) · L3: hydrate_schemaorg (commercial-web) · Sector: hydrate_skos, hydrate_fibo_fnd, hydrate_fibo_be (FIBO BE inherits FND) - Dedicated (non-OWL): SchematronHydrator, XsdHydrator + collect_xsd_files, SkrHydrator + hydrate_skr03/skr04/skr03_bau + the three IRI prefix constants. - ZUGFeRD/Factur-X: hydrate_zugferd + hydrate_zugferd_rules (XSD + Schematron over EN16931). - Full re-export surface from the crate root shown as a single `use` block for consumer ergonomics. §4.3 also maps each plan deliverable to the hydrators it uses: - D-UB-1 names the producer-side shape. - D-UB-2 (SmbBridge) declares no hydrator dep until OGIT/NTO/SMB/ ships. - D-UB-3 (lance_cache::ontology_cache_schema) persists each hydrator's ContextBundle output as the Lance column rows. - D-UB-4..6 (per-consumer constructors) take an already-hydrated Arc<OntologyRegistry>; deployment chooses the menu: · woa-rs: dolce + provo + qudt + schemaorg + fibo_fnd + skr03/skr04 + future OGIT/NTO/WorkOrder. · smb-office-rs: same minus WorkOrder, plus skr03_bau + zugferd + zugferd_rules. · MedCare-rs: dolce + owltime + provo + qudt + skos + future OGIT/NTO/Healthcare. No deliverable IDs renumbered; this is a clarification of §4's referenced surface against the now-shipped types. Other plan sections unchanged.
Summary
Ten of seventeen planned
bO-*ontology hydrators ship in this branch,plus two brand-new Tier-C hydrator types (
XsdHydrator,SchematronHydrator) that unlock the remaining XSD-based and rule-basedspecs (UBL, ISO 20022, XRechnung, GoBD).
DOLCE_V1OWLTIME_V1PROVO_V1QUDT_V1SKOS_V1FIBOFND_V1FIBOBE_V1SCHEMAORG_V1SKR03_V1+SKR04_V1ZUGFERDRULES_V1ZUGFERD_V1New hydrator types (Tier-C unlocks)
XsdHydrator(hydrators/xsd.rs): streams XSD viaquick-xml,interns
<xs:element>/<xs:complexType>/<xs:simpleType>/<xs:attribute>/<xs:attributeGroup>names as{targetNamespace}#{name}IRIs. Reusable for UBL, ISO 20022, etc.SchematronHydrator(hydrators/schematron.rs): walks.schfiles, interns assert/report/pattern
@idplus bracketedbusiness-rule IDs from message text (
[BR-52],[BR-CO-03],[PEPPOL-EN16931-R008]). Reusable for XRechnung, GoBD, and anyISO/IEC 19757-3 spec.
SkrHydrator(hydrators/skr.rs): CSV-driven hydrator for DATEVSKR charts of accounts. SKR 03 and SKR 04 go into separate G slots
because the same account number means different things across the
two schemes (1000 = Kasse in SKR03 vs RHB in SKR04).
Format-detection improvement
OwlHydrator::detect_formatnow content-sniffs the first 256 bytesfor
<?xml/<rdf:RDFwhen the extension is ambiguous (.owlfilesexist in both Turtle and RDF/XML form in the wild). Existing
.rdf/.ttlextension-based routes are unchanged.Data drops
data/ontologies/dul-extensions/— Conceptualization + LMM_L2 (CC-BY)data/ontologies/skos/— Core + XL + OWL-1-DL variant (W3C)data/ontologies/zugferd/— 4 CII XSDs + Schematron + codelist(Apache-2.0, LandrixSoftware)
data/ontologies/skr-datev/— SKR 03 (1623), SKR 03 Bau (1880),SKR 04 (1232) accounts as CSV +
parse_pdfs.pyreproducibilityscript + README documenting known data-quality gaps
What's NOT here (deferred)
XbrlHydrator(taxonomy-specific extensionof XsdHydrator)
xs:extension base→rdfs:subClassOf)⊑ schema.org Invoice). Left to downstream consumers.
Test plan
lance-graph-callcenter,lance-graph-consumer-conformance,cognitive-shader-driverConsumer impact
NamespaceBridge,OntologyRegistry,bridges::WoaBridge) unchanged — its lance-graph integration isblocked operationally on Docker build context, not on API. No
breaking changes.
schema.org Customer/Organization are now available via hydrators —
can replace hand-rolled
smb-ontologyschemas with composite hydratorcalls once consumer chooses to wire them in.
Branch ancestry
375b87a(bO-1 DOLCE) →67f7dc4(bO-2/3/4/8) →465361e(bO-6/7ea9c368(bO-4 quantitykinds) →e041d0d(bO-5 +DUL extensions) →
9e65307(bO-16 + XsdHydrator) →d4589aa(bO-156f983db(bO-13 data) →299e6c4(bO-13SkrHydrator).
10 commits, ~12k insertions of mostly data + tests, 114 tests pass.
Generated by Claude Code